[None][fix] fix PEFT page accumulation in MaxUtilizationPolicy scheduler by achartier · Pull Request #13528 · NVIDIA/TensorRT-LLM

achartier · 2026-04-27T21:25:18Z

Summary by CodeRabbit

Bug Fixes
- Fixed PEFT page counter tracking in the scheduler to properly maintain cumulative page usage across multiple requests, preventing resource accounting inconsistencies.
Tests
- Added comprehensive test to validate that PEFT page limits are correctly enforced when scheduling multiple LoRA/PEFT requests with distinct task identifiers.

Description

In MaxUtilizationPolicy._try_scheduling_request, num_scheduled_peft_pages was passed by value and never returned, so the caller's accumulator stayed at 0 across all requests. This caused the per-batch PEFT page cap to be checked against 0 already-scheduled pages for every request, potentially over-admitting LoRA tasks beyond the device cache limit.

Fix mirrors the C++ reference (in capacityScheduler.cpp) return the updated count from _try_scheduling_request and accumulate it in the caller.

Add a unit test that fails before the fix (3 tasks admitted when only 2 fit within max_pages=25) and passes after.

Test Coverage

Added new test

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

In MaxUtilizationPolicy._try_scheduling_request, num_scheduled_peft_pages was passed by value and never returned, so the caller's accumulator stayed at 0 across all requests. This caused the per-batch PEFT page cap to be checked against 0 already-scheduled pages for every request, potentially over-admitting LoRA tasks beyond the device cache limit. Fix mirrors the C++ reference (in capacityScheduler.cpp) return the updated count from _try_scheduling_request and accumulate it in the caller. Add a unit test that fails before the fix (3 tasks admitted when only 2 fit within max_pages=25) and passes after. Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>

coderabbitai · 2026-04-27T21:29:27Z

📝 Walkthrough

Walkthrough

The changes modify the scheduler's PEFT page tracking mechanism in MaxUtilizationPolicy._try_scheduling_request to return both scheduling status and accumulated PEFT page count, ensuring consistent page accounting across scheduling iterations. A new unit test validates cumulative PEFT page usage enforcement across multiple LoRA/PEFT requests.

Changes

Cohort / File(s)	Summary
Scheduler PEFT page tracking `tensorrt_llm/_torch/pyexecutor/scheduler/scheduler.py`	Modified `_try_scheduling_request` return type from `bool` to `(bool, int)` tuple, preserving PEFT page counter on early exits and returning cumulative scheduled pages alongside scheduling status.
Scheduler tests `tests/unittest/_torch/executor/test_py_scheduler.py`	Added new test case validating cumulative PEFT page usage across multiple requests, verifying that global PEFT page limits are enforced when scheduling multiple LoRA/PEFT requests with distinct task identifiers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main change: fixing PEFT page accumulation in the MaxUtilizationPolicy scheduler, which directly matches the core issue described in the PR.
Description check	✅ Passed	The description clearly explains the bug (num_scheduled_peft_pages passed by value), its impact (over-admission beyond cache limits), the fix (return updated count and accumulate), and test coverage (unit test demonstrating the fix). All required template sections are addressed.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/unittest/_torch/executor/test_py_scheduler.py (1)

2076-2100: Regression coverage is solid, and QA integration list updates are unnecessary for this unit-test-only change.

This test cleanly reproduces the accumulation bug and verifies the intended cap behavior under MAX_UTILIZATION.
As per coding guidelines, if a PR only touches unittest or narrow unit scope, QA integration test-list updates are unnecessary or optional.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/executor/test_py_scheduler.py` around lines 2076 -
2100, The test test_max_utilization_peft_page_accumulation demonstrates that
PyCapacityScheduler under CapacitySchedulerPolicy.MAX_UTILIZATION must
accumulate PEFT page usage across admitted requests; ensure
MockPeftCacheManager(max_pages=25, pages_per_request=10) behavior is used by
PyCapacityScheduler.schedule_request so that cumulative pages from admitted
tasks are tracked and the third request is rejected. Locate the test
test_max_utilization_peft_page_accumulation and the scheduling logic in
PyCapacityScheduler.schedule_request and confirm the scheduler queries
peft_cache_manager (MockPeftCacheManager) for current allocated pages and adds
the candidate request's pages_per_request before admitting; if accumulation is
missing, update schedule_request to sum existing allocated PEFT pages plus the
new request's pages and enforce max_pages when deciding to admit or reject.
Ensure the test asserts len(fitting) == 2 still passes after the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unittest/_torch/executor/test_py_scheduler.py`:
- Line 2097: The test unpacks three values from scheduler.schedule_request into
fitting, disagg, paused but only fitting is used; rename the unused unpacked
variables to underscore-prefixed names (e.g., _disagg and _paused) to satisfy
RUF059. Update the unpacking at the call to scheduler.schedule_request([r0, r1,
r2]) so it assigns fitting, _disagg, _paused and leave all other references to
fitting unchanged.

---

Nitpick comments:
In `@tests/unittest/_torch/executor/test_py_scheduler.py`:
- Around line 2076-2100: The test test_max_utilization_peft_page_accumulation
demonstrates that PyCapacityScheduler under
CapacitySchedulerPolicy.MAX_UTILIZATION must accumulate PEFT page usage across
admitted requests; ensure MockPeftCacheManager(max_pages=25,
pages_per_request=10) behavior is used by PyCapacityScheduler.schedule_request
so that cumulative pages from admitted tasks are tracked and the third request
is rejected. Locate the test test_max_utilization_peft_page_accumulation and the
scheduling logic in PyCapacityScheduler.schedule_request and confirm the
scheduler queries peft_cache_manager (MockPeftCacheManager) for current
allocated pages and adds the candidate request's pages_per_request before
admitting; if accumulation is missing, update schedule_request to sum existing
allocated PEFT pages plus the new request's pages and enforce max_pages when
deciding to admit or reject. Ensure the test asserts len(fitting) == 2 still
passes after the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: dab3f1ba-2ceb-4816-b02e-c09f4ccd41b5

📥 Commits

Reviewing files that changed from the base of the PR and between bf96323 and 1668934.

📒 Files selected for processing (2)

tensorrt_llm/_torch/pyexecutor/scheduler/scheduler.py
tests/unittest/_torch/executor/test_py_scheduler.py

Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>

achartier · 2026-04-28T15:30:00Z

/bot run --disable-fail -fast

tensorrt-cicd · 2026-04-28T15:36:41Z

PR_Github #45948 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: -fast

Link to invocation

achartier · 2026-04-28T15:39:12Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-28T15:46:52Z

PR_Github #45950 [ run ] triggered by Bot. Commit: 52e94ef Link to invocation

tensorrt-cicd · 2026-04-29T00:12:46Z

PR_Github #45950 [ run ] completed with state FAILURE. Commit: 52e94ef
/LLM/main/L0_MergeRequest_PR pipeline #36105 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

achartier requested a review from a team as a code owner April 27, 2026 21:25

achartier requested a review from byshiue April 27, 2026 21:25

github-actions Bot assigned achartier Apr 27, 2026

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread tests/unittest/_torch/executor/test_py_scheduler.py Outdated

address review

52e94ef

Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] fix PEFT page accumulation in MaxUtilizationPolicy scheduler#13528

[None][fix] fix PEFT page accumulation in MaxUtilizationPolicy scheduler#13528
achartier wants to merge 2 commits intoNVIDIA:mainfrom
achartier:fix/peft-page-accounting

achartier commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 27, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

achartier commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

achartier commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

achartier commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Apr 27, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

achartier commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

achartier commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

achartier commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading